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If  the  number  of  factors  in  an  experiment  is  increased, 
necessarily  follow  that  the  size  of  the  experiment  must  increase  to  achieve  a 
satisfactory  analysis?  In  some  common  situations  the  answer  is  No.  The 
present  paper  discusses  a  model  suggested  by  medical  diagnostic  problems  in 
which  the  answer  is  Yes:  indeed,  the  increase  in  size  is  exponentially 
fast.  The  conclusion  is  drawn  that  statisticians  should  be  cautious  before 
embarking  on  the  study  of  data  with  large  numbers  of  factors  because  the  data 
may  be  inadequate  for  a  sensible  analysis.  The  basic,  mathematical  tool  is 
the  Kullback-Leibler  number  which  measures  the  discrimination  between  the 
possibilities.  Calculation  of  these  numbers  uses  interactions,  forming  a 
basis  for  all  the  effects  that  might  occur., 
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SIGNIFICANCE  AND  EXPLANATION 


Suppose  that  on  each  of  a  number  n  of  experimental  units  the  values 
of  m  factors,  or  variables,  are  measured:  for  example,  n  patients  may  be 
tested  for  the  presence  of  m  symptoms.  Then  it  seems  intuitively  reasonable 
that  as  the  number  of  factors  increases  it  may  be  necessary  to  observe  more 
units  in  order  to  understand  the  influence  that  the  factors  have.  In  many 
situations  studied  in  statistics  this  is  not  so.  In  the  present  paper  a 
contrary  case  is  considered  in  which  the  growth  in  n  needs  to  be 
exponentially  fast  in  mj  for  example  each  extra  factor  may  mean  a  25% 
increase  in  the  number  of  units  needed.  The  case  is  suggested  by  medical 
diagnostic  problems,  though  as  a  description  of  the  medical  reality  it  has  a 
number  of  defects.  Nevertheless  the  general  moral  that  increasing  complexity 
of  factors  means  more  data  is  surely  true  there. 

The  practical  import  of  the  results  is  that  scientists  and  their 
statistical  advisors  should  be  wary  of  handling  data  sets  with  a  large  number 
of  factors  simply  because  n  may  not  be  large  enough  to  permit  reliable 
conclusions.  The  tendency  these  days  to  try  to  make  sense  of  a  lot  of  data  - 
a  tendency  made  possible  by  the  advent  of  powerful  computers  -  should 
sometimes  be  resisted.  More  thought  should  go  into  the  sensible  design  of 
experiments,  and  the  statistician's  role  should  not  be  confined  to  data 
analysis,  which  may  be  a  hopeless  task. 

The  mathematical  analysis  uses  a  concept  of  distance  between  the  various 
possibilities  that  might  explain  the  data.  This  distance  is  called  a 
Kullback-Leibler  number.  In  the  model  studied  the  number  of  possible 
descriptions  gets  to  be  so  large  that  they  crowd  together  even  in  the  wide- 
>en  regions  of  m-space  and  hence  become  close  together  and  difficult  to 
ri-,M  -ate.  In  other  situations  this  crowding  does  not  occur.  The  mathematical 
’  :lat^ons  are  closely  related  to  those  that  arise  in  the  study  of  error- 
correcting  codes:  for  if  two  messages  are  too  close  they  also  cannot  be 
separated . 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


THE  RELATIONSHIP  BETWEEN  THE  NUMBER  OF  FACTORS 
AND  SIZE  OF  AN  EXPERIMENT 

D.  V.  Lindley 
1 .  Introduction 

With  the  growth  in  computing  power  statisticians  have  turned  their 
attention  to  data  sets  involving  a  large  number  of  factors,  or 
variables.  In  this  paper  we  consider  the  relationship  between  n,  the 
size  of  the  data  set, or  experiment,  and  m,  the  number  of  factors. 
Intuitively  it  appears  plausible  that  as  the  number  of  factors  increases 
so  should  the  size  of  the  experiment  in  order  to  unravel  the  increasing 
complexity  that  could  arise  with  many  factors.  The  following  example 
shows  that  this  is  not  necessarily  so. 

In  an  experiment  in  which  all  factors  are  at  two  levels  and  a 
complete,  factorial  arrangement  is  used,  any  contrast  is  a  mean  of  2m-1 
values  against  a  similar  mean  and  has  variance  2-^nl-^  times  that  of 
any  unit.  Hence  the  variance  of  any  contrast  per.  unit  of 
experimentation  is  4,  irrespective  of  m,  and  the  information  gained 
about  the  contrast  per.  unit  is  the  same  irrepective  of  the  number  of 
factors  involved.  In  this  sense  the  number  of  factors  does  not 
influence  the  need  for  more  data. 

Another  example  arises  with  the  multivariate  normal  distribution 
with  n  observations  on  each  of  m  variables.  Each  mean  has  n 
values  to  estimate  it,  as  has  each  variance.  Equally  each  covariance 
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has  n  pairs  of  values  available.  Again  the  number  of  variables  might 
appear  to  have  no  effect.  A  case  with  m  much  larger  than  n  is  worth 
contemplating.  This  case  is  not  so  clear-cut  since  standard 
multivariate  theory  has  difficulties  whenever  n  is  less  than  m. 

In  this  paper  we  study  a  model  and  show  that  there  n  does  have  to 

increase  with  m;  indeed  the  increase  is  exponentially  fast  in  order 

that  the  many  possibilities  introduced  by  an  increase  in  m  can  be 
investigated  adequately.  The  model  was  suggested  by  problems  of  medical 
diagnosis  where  upwards  of  40  symptoms  (factors)  may  be  observed  on  a 

patient  and  it  is  required  to  diagnose  the  disease  from  these. 

Typically  the  presence  or  absence  of  a  symptom  is  not  too  well-defined 
and  the  model  allows  for  errors  in  this  regard.  We  do  not  pretend  to 
have  made  a  contribution  to  actual  diagnosis  but  merely  to  indicate  the 
difficulties  that  might  arise  when  many  symptoms  are  studied. 

2.  The  Model 

All  quantities  are  binary,  taking  the  values  plus  or  minus  one,  or 

simply  +  or  -.  There  are  m  binary  factors  £  »  (£.,€_,•••  5  )  and 

i  z  m 

dependent  on  them  is  a  binary  response  h  described  by  a  response 
function  n  -  6(£).  For  each  unit  in  the  data  set  every  has  chance 

1/2  of  being  +  independently  of  the  other  factors:  thus  each  5  has 
chance  2~ra.  Each  of  the  factors  and  the  response  for  a  unit  are 
observed  with  a  possible  error.  'is  observed  as  and  n  as  y 

with  ptx^^  i*  *  p(y  j*  0)  *  p,  p  <  V2  •  These  are  independent  and 


"...  -'<#****!&■'  -  - 


independent  of  the  factors  or  response.  We  write  x  =  (x^Xj,...  x^). 

On  the  basis  of  observations  z  *  (x,y)  on  each  of  n  units  it  is 
required  to  make  inferences  about  S,  the  response  mechanism.  A  few 
comments  on  the  model  are  now  offered. 

The  structure  through  £'s  and  n  measured  with  error  seems 
reasonable  for  many  situations  (as  medical  diagnosis)  where  the  view  is 
that  were  the  correct  quantities  to  be  measured  the  response  would  be 
determined.  It  therefore  differs  from  the  factorial  situation  mentioned 
above  in  that  there  are  errors  in  the  factors  (economists  speak  of 
errors-in-variables  models).  Where  the  model  is  seriously  deficient  as 
a  description  of  reality  is  in  the  chance  structure  imposed  for 
simplicity  in  the  subsequent  analysis.  It  might  be  possible  to  have 
(and  hence  xi)  equally  likely  to  be  +  or  -  by  defining  the 
factors  suitably:  for  example,  a  continuous  variable  could  be 
dichotomized  at  the  median.  But  it  is  unrealistic  to  assume  that  the 
factors  are  independent.  Our  reason  for  doing  so  is  that  we  do  not  know 
of  a  simple  way  of  describing  adequately  dependence  amongst  binary 
factors:  indeed,  this  is  a  major  problem  in  a  satisfactory  statistical 
treatment  of  diagnosis.  Another  defect  is  that  the  errors  have  the  same 
chances  for  each  factor.  In  the  medical  situation  this  is  not  true: 
age  is  more  accurately  determined  than  blood  pressure.  Different 
chances  could  be  studied  with  the  penalty  of  a  substantial  increase  in 
algebraic  and  computational  complexity.  Our  defense  of  the  model  is 
that  our  interest  lies  primarily  in  the  relationship  between  m  and 
n  as  far  as  determining  5  is  concerned,  and  that  the  general  form  of 
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this  relationship  might  not  be  too  disturbed  by  more  realistic  modelling 
of  the  chances.  The  exact  numerical  values  calculated  below  should  not 
be  taken  seriously:  but  we  hope  that  their  orders  of  magnitude  might 
be. 

It  might  be  suspected  that  problems  would  arise  as  m  increases 
« 

since  the  number  of  o's  is  2  ,  making  the  determination  of  5  more 

difficult.  Notice  that  the  model  has  no  parameters  and  the  argument  is 
essentially  non-parametric. 

3.  Discrimination  between  response  functions: 

Let  6^  be  any  two  different  response  functions  carrying  £ 

into  n  and  at  some  stage  of  the  experiment  let  p^)  denote  the 
probability  attached  to  6^  If  an  additional  unit  is  observed  to  have 
z  =  (x, y),  then  the  log-odds  for  4^  against  $2  will  become 

P(5.,lz)  pUl^)  p<$t) 

109  ‘  109  4  109 

by  Bayes  theorem.  Conditional  on  4^,  the  expected  change  in  log-odds 
will  be 

_  P<z|4  ) 

Mi  :  V  ~  l  p(zlV  *  lpg  <3-1> 
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a  Kullback-Leibler  number.  (Notice  that  in  the  notation  the  first 
argument  of  A  is  the  6  for  which  the  expectation  is  being  calculated 
-  the  "true"  value  -  whereas  the  second  argument  is  the  alternative 
value.  In  general  A ( 6 ^  :  5^ )  ?  A(62  s  We  aometimes  abbreviate  to 


As  more  units  are  added  the  log-odds  change  by  the  addition  of  the 
appropriate  log-likelihood-ratio,  and  the  expected  change  for  n  units 
is  nA  .  To  achieve  a  prescribed  level  of  discrimination  between  6 

I*  1 

and  when  6  obtains  we  would  need  the  log-odds  to  attain  a 

prescribed  level  R  and  expect  to  see  n  units  to  achieve  this  where 
R  =  n^12*  Hence  the  sample  size  required  is  proportional  to  A^.  For 
a  given  6^  the  discrimination  is  most  difficult  and  requires  the 
largest  sample  size  for  the  alternative  $2  that  minimize-*  A(6^  s  6) 
over  all  6  not  equal  to  A  .  Since  ^  is  itself  initially  unknown, 
the  size  of  experiment  is  related  to  the  least  Kullback-Leibler  number 
!  ^2^  over  a11  unequal  A  ,  6^. 

In  the  discussion  that  follows  it  will  be  supposed  that  all  A’s 
are  initially  equally  likely  so  that  the  original  log-odds  are  zero.  If 
this  is  not  so  then  the  appropriate  log-odds  should  be  added  to  n^12 
before  the  evaluation. 

The  Kullback-Leibler  numbers  are  found  by  first  calculating 

E12  “  E(51  :  ^  P(zl6.,>  *  log  P(«l«2)  (3.2) 

z 

for  all  A^f  ^2  including  A^  =  then  A^  =  E  -  E  . 
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4.  Interactions 


The  evaluation  of  the  A's  for  general  5  presents  difficulties 
so  we  consider  first  an  important  subclass  of  6*s  called 
interactions.  In  applications  interest  might  centre  on  6*s  that 
depend  only  on  a  subclass  of  the  m  factors.  If  changing  has  no 

effect  on  A  whatever  be  the  values  of  the  other  factors  we  say  5^  is 
irrelevant  to  <5:  the  remaining  factors  are  called  relevant  to  A. 

The  set  of  relevant  factors  for  A  is  denoted  R(A). 

6  is  an  odd  (even),  k-f actor  interaction  if  R(A)  contains  k 
factors  and  n  =  1  iff  an  odd  (even)  number  of  these  factors  are  +. 

The  odd  and  even  interactions  with  the  same  R(A)  are  called 
complementary.  The  1-factor  interactions  may  be  called  the  main  effects 
of  the  single,  relevant  factor.  To  complete  the  situation  we  need  the 
two  complementary  zero-factor  interactions  in  which  h  =  -(+)  for  all 
£.  Notice  that  interactions,  although  similar  to  interactions  in 
factorial  experiments,  are  different  in  that  they  occur  in  complementary 
pairs,  the  one  being  obtained  from  the  other  by  interchanging  +  and 
in  the  response  values.  All  the  odd  interactions  for  3  factors  are 
listed  in  the  Table. 

In  an  abuse  of  notation,  a  k-factor  interaction  will  often  be 

denoted  A^.  Two  different  k-factor  interactions  will  be  written  A^ 

and  A ' . 
k 
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The  reader  not  interested  in  the  proofs  may  at  this  point  proceed 


directly  to  Theorem  2  below:  the  I-  and  J-functions  are  defined 
immediately  after  the  statements  of  lemmas  2  and  3.  The  mathematical 
techniques  employed  are  related  to  those  used  in  error-correcting  codes 
(for  example,  Lin  (1970))  but  the  development  given  here  is  self- 
contained. 

5.  Probability  evaluations 

For  any  5  =  (C,n)  either  h  =«  6(C)  or  not.  The  2m  C’s  for 
which  h  =  6(5)  are  said  to  agree  with  6  and  the  set  of  them  is 
denoted  Z(6).  since  all  2™1  C's  have  the  same  chance 

p(z|6)  -  2  m  £  p(z|C)  (5.1) 

C  e  Z( 6) 

and 

.  s  m+1-s 

p(z|C)  *»  p  q 

where  q  =  1-p  and  s  is  the  number  disagreements  between  z  and  5, 
a  disagreement  being  where  or  where  h  f  y.  We  refer  to  s 

as  the  (Hamming)  distance  between  z  and  C. 

The  following  simple  result  will  be  used  repeatedly  in  the 
argument.  If  s1  is  the  number  of  disagreements  in  respect  of 

relevants  factors  and  the  response,  and  82  the  number  of  disagreements  : 

for  irrelevant  factors,  so  that  s  =  s1  +  Sj,  we  may  write 

t 

s  k+1-s  s  m-k-s 

p(z|C)  =  p  q  *  P  q  (5.2)  * 

-7-  i 

4 

<? 
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(s1  is  called  the  relevant  distance).  In  summing  p(z|5)  over 
C  e  z(6)  all  combinations  of  the  irrelevant  factors  will  occur  since 
they  have  no  effect  on  h.  Hence  the  second  factor  in  (5.2)  will  yield 
(p+q)m“^  =  1.  Consequently  in  evaluating  (5.1)  it  is  enough  to  consider 
only  the  relevant  factors  and  the  response.  This  is  termed  the 
relevancy  principle. 

We  proceed  to  the  evaluation  of  p(z|6)  -  equation  (5.1)  -  when  6 

is  a  k-factor  interaction.  There  are  two  cases  to  consider  according  as 

z  e  Z(6)  or  not.  If  z  e  Z(i)  there  is  one  5  identical  with  z 

and  distant  0.  There  are  no  C's  distant  one  since  a  disagreement  for 

a  5^  will  mean  a  disagreement  either  for  another  factor  or  for  the 

response  by  the  interaction  property.  There  are  two  types  of  C 

distant  2:  those  with  h  *  y  and  two  disagreements  in  relevant 

factors j  and  those  with  h  f  y  and  one  disagreement  in  relevant 

factors.  There  are  (*)  of  the  first  type  and  k  of  the  second,  a 

total  of  (The  other  factors  are  omitted  by  the  relevancy 

principle.)  Continuing  with  this  line  of  reasoning  we  find  that  if 

z  e  Z(fi)  there  are  ■  -at .  relevant  distance  s.,  provided  s, 

81 

is  even,  and  none  if  s^  is  odd.  If  z  4  Z ( <5 )  the  same  result  holds 
with  odd  and  even  interchanged.  Hence 

.  s,  k+1-s, 

PI.IO  -  2-  I  (**,)p  ' 

K  «, 

where  the  summation  is  over  even  (odd)  according  as 

z  e(^)  Z(\)i  0  <  Sl  <  k+1  . 
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Lemma  1 «  In  n  Bernoulli,  trials  with  chance  p  of  success,  the 


The  theorem  is  obvious  from  lemmas  2  and  3  since  “  Ei2* 

The  special,  complementary  case  follows  easily  by  following  an  argument 
parallel  to  that  used  to  establish  lemma  3. 

7.  Expected  size  of  experiment 

The  implications  of  Theorem  2  are  best  understood  by  referring  to 

the  Figure  which  sketches  the  functions  I(x)  and  J(x)  in 

0  <  x  <  Properties  of  these  functions  that  are  cited  are  all  easily 

established  by  use  of  the  differential  calculus  and  the  proofs  are 

accordingly  omitted.  I(x)  is  decreasing  and  J(x)  increasing  in  the 

interval  and  they  are  equal  to  -log  2  at  x  *  Since  increases 

with  j,  A ( 6^  s  6^),  which  is  of  course  positive,  decreases  both  with 

j  and  with  ks  that  is,  the  higher  the  order  of  the  interactions  (or 

equivalently,  the  more  factors  involved  in  either  the  true  or 

alternative  response)  the  more  difficult  it  is  to  separate  them. 

As  we  saw  in  section  3,  the  expected  size  of  experiment  is  inversely 

proportional  to  the  Kullback-Leibler  number  and  with  m  factors  the 

least  number  will  be  A(5  :  6*  )  in  comparing  one  (m-l)-factor 

in—  I  in—  1 

interaction  with  another.  (There  are  only  2  complementary  m-f actor 
interactions  which  are  more  easily  distinguished  because  of  the  doubling 
of  the  Kullback-Leibler  number  that  then  occurs:  see  Theorem  2.) 
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Corollary  1.  In  the  assumed  model  with  m  factors  and  attention 


confined  solely  to  interactions ,  the  expected  size  of  experiment  to 
distinguish  adequately  between  all  response  functions  is  inversely 
proportional  to 


A<<$  «•  ) 

m-1  m-1 


I(a  )  -  J(o ) 
m  m 


1 ,  sm 
2<q-P> 


log 


3 *l3=sf. 

l-(q-p)1" 


The  behaviour  of  this  function  with  ra  can  be  appreciated  by 
noting  that  for  large  m  it  is  approximately  (q-p)m*  so  that  the 
increase  of  n  with  m  is  about  exponentially  fast.  The  following 
values  for  p  »  0.1  underline  this  point: 


m  2  4  8  16  32 

A  .4852  .1782  .0284  7.925  x  10_4  6.277  x  io"7 

m-  *  t  m- 1 

so  that  an  experiment  with  32  factors  (or  medical  symptoms)  can  be 
expected  to  require  about  10®  times  more  observations  than  one  with 
only  2  factors.  As  explained  in  section  2  these  numbrs  are  not  to  be 
taken  too  seriously,  they  can  only  express  orders  of  magnitude.  Notice 
that  they  are  exaggerated  by  the  fact  that  the  worst  case  is  being 
discussed.  With  a  distribution  over  A’s  the  sizes  would  fall.  For 
example,  if  one  was  sure  that  only  k  out  of  the  m  factors  were 
relevant  (but  not  which  k)  then  A  not  A  ,  would  indicate 

K  f  k  m- 1  f  m- 1 

the  order.  Against  this,  only  interactions  are  being  considered,  2ra 
out  of  the  2*  possible  functions,  so  that  even  larger  sizes  are 
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possible:  see  section  8 


*•»**- .... 


If  j  >  k  in  Theorem  2,  A(6  :  6  )  >  A( 6  s  5,).  Taking  k  =  1 

J  K  J 

for  illustration,  if  a  main  effect  is  the  true  response  it  is  harder  to 
eliminate  a  j-f actor  (j  >  1)  interaction  than  it  would  be  to  eliminate 
the  main  effect  were  the  true  response  that  j-f actor  interaction. 

8.  General  response  functions 

A  feature  of  interactions  that  greatly  facilitates  the  computations 
is  that  p(z|A),  as  z  ranges  over  all  2m+1  values,  assumes  only  2 
values  (Theorem  1)  with  the  result  that  any  E  (equation  (3.2))  only 
contains  2  logarithmic  terms.  For  a  general  response  many  more  values 
will  arise  and  the  calculation  of  E's,  and  hence  A*s,  is  formidable. 
Some  understanding  of  what  happens  with  a  general  response  can  be 
obtained  by  calculating  a  generalized  Kullback-Leibler  number 

p(z|«  > 

A<A  :  *  l  pul6>  •  log  rTTiTT 

1  2  z  pUl V  (8.1) 

-  E(  A  j  )  -  E( A  :  A2) 

where  A^  and  A^  are  still  interactions  but  A  is  general.  This  is 
the  expected  change  in  log-odds  from  one  unit  when  comparing  two 
interactions  but  when  the  true  response  is  A.  Even  in  complete 
factorial  experiments  (section  1)  it  is  usual  to  do  the  analysis  in 
terms  of  the  interactions  even  though  the  overall  effect  is  not  a  pure 
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interaction.  There  any  effect  ia  a  linear  combination  of  interactions: 
the  same  property  holds  here.  Hence  a  study  of  interactions  with  a 
general  response  is  not  inappropriate. 

He  now  proceed  to  evaluate  E(i  :  6^)  where  6  is  arbitrary  and 
6  is  an  interaction.  There  are  two  ways  to  do  this:  either  by  an 
extension  of  the  argument  already  used  when  5  is  an  interaction  or  by 
using  the  theory  of  vector  spaces  in  which  S  is  expressed  as  a  linear 
combination  of  interactions.  The  second  method  is  simpler  and  has  the 
added  advantage  that  the  structure  of  the  situation  is  more  clearly 
revealed.  The  reader  not  interested  in  mathematical  details  may  proceed 
to  the  statement  of  Theorem  3  though  equation  (8.2)  has  to  be  understood 
in  order  to  appreciate  the  role  of  the  w's  in  the  statement  of  that 
theorem. 

To  express  any  response  function  as  a  vector  in  2m-space  write 
out  the  2ra  possible  5's  in  any  fixed  order.  The  Table  gives  such  a 
list  for  m  ■  3.  The  values  H(C)  form  a  string  of  2m  pluses  and 
minuses  and  provide  a  complete  description  of  6:  we  shall  denote  this 
vector  by  5.  Such  a  representation  of  a  5  is  given  in  the  fourth 
column  of  the  Table.  For  any  two  different  6*s  consider  the  scalar 
product  contribution  to  this  will  be  +1  if  and 

agree  at  a  particular  place  and  -1  if  not.  Let  N  denote  the  number 
of  agreements  between  6  and  so  that  2m-N  is  the  distance 

between  them.  Then 

-  N  -  ( 2 m-N )  -  2(N-2nt“1) 
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and  we  write 

-  »  .  (8.2) 

If  6^  =  6^  then  2  m6T6  »  1;  if  6^  and  6^  are  complementary 
2  m6T6  =  -1.  If  6^  and  6^  are  any  two  different  and  non¬ 
complementary  interactions,  N  *  2m-1  and  w  -  0.  It  follows  that 
the  2m  odd  interactions  form  an  orthogonal  basis  for  the  vector 
representation  of  5's.  (The  same  holds  for  the  even  interactions.)  We 
may  therefore  write  6  =  £  ai^i  where  ®  is  arbitrary,  {6^ j-  is  the 
set  of  odd  interactions  and  (a^)  is  a  set  numbers  to  be  found.  For 
interaction  <5  ,  <5TS  =  J  a.M  =  a  2m  and  consequently  from  (8.2) 

S  S  IIS  s 

wg  =  ag.  We  therefore  have  the  result  that  any  response  can  be  written 
as  a  linear  combination  of  odd  interactions  with  weights  w  equal  to 
2  -  1  where  Ng  is  the  number  of  agreements  between  6  and 

the  interaction  equation  (8.2). 

Consider  the  2m  vectors  of  all  odd  interactions  to  form  a  square 
matrix.  Then  the  number  of  minuses  in  every  row  of  this  matrix  will 
be  2m_1  except  for  one  row  in  which  it  will  be  2m.  (The  Table  again 
illustrates  for  m  =  3.)  This  exceptional  row  will  be  that 
corresponding  to  the  £  for  which  every  is  -.  If  6  has  also  - 

in  this  row  (if  not,  then  use  the  representation  in  terms  of  even 
interactions)  the  total  number  of  agreements  between  6  and  all  odd 
interactions  will  be  £  N  =  2m  +  (2m-1)2m  1  from  which  it  follows  that 

3 

I  ws  - 
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so  that 

P(z  I  $)  *  2“m  l  p(x  |  ?)  I  w  p(y  |  6  (O) 

C  «  '  * 

-  2  m  l  v  l  p(x  |  C)p(y  I  5  (£)) 

8  8  5  8 

■  I  »g  P<*  I  fi8>  • 


In  words,  the  representation  of  6  in  terns  of  interactions  is  also  the 

representation  of  p(z  |  6)  in  terms  of  p(z  |6g).  (Note  that  although 

I  w  «  1,  the  w's  are  not  necessarily  positive  so  that  this  is  not  a 
s  ' 

straightforward  probability  result.)  It  immediately  follows  from  (8.1) 
that 

E(  6  i  6  )  «■  l  w  E(  5  j  5  ) 

1  u  s  s  1 
s 


and  the  E(^8  s  5 ^-values  are  known  from  lemmas  2  and  3.  Hence,  again 


E(6  s  6^  -  w1  i(a  )  +  (1  -  w1  > J< ai+1  ">  -  m  log  2 

where  i  is  the  order  of  the  interaction  $  . 

Theorem  3.  For  any  response  6  and  for  different  and  non¬ 
complementary  interactions  <5g  and  6  of  orders  s  and  t 
respectively  the  generalized  Kullback-Leibler  number  d(<$  :  is 


w 

s 


"V.1  * 


(1  -  w  )J(a 

a 


s+1 


«t  !<«„,)  *  (1 


(8.3> 


Here  Ng  is  the  number  of  agreements  between  5  and  <5^  and 
w_  -  2~<m_1 ' (N_  -  2m~ 1 ) . 

8  B 

The  result  follows  directly  from  the  evaluation  of  E(6  s  6  )  and 

s 

E(5  s  $t>. 

If  6  «  6  ,  w  *  1  and  w.  »  0  and  we  have  Theorem  2,  the  special 
8  8  ^ 

case  of  that  Theorem  being  excluded. 

Suppose  R(5t>  is  not  contained  in  R($),  then  in  the 
representation  of  6,  will  not  appear  and  wfc  »  0.  If  R(<S^)  is 
contained  in  R(6)  then  by  the  relevancy  principle  we  may  confine  our 
attention  to  the  factors  in  R(6).  If  these  are  k  in  number 
wt  *  2_^”^(Nt  -  2*-^)  where  refers  only  to  agreements  in  R(S). 

To  understand  what  is  happening,  suppose  <5  is  any  response  with 
R(6)  containing  k(<  m)  factors.  If  two  interactions  not  in  R(6) 
are  compared  (8.3)  reduces  to  J(a  )  -  J(u  )  which  has  the  sign 

1  t '  1 

of  <s-t)  so  that  a  lo'«r  order  interaction  will  appear  leas  probable 
than  a  higher  order  one.  If  two  interactions  of  the  same  order, 
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t,  are  compared,  one  within  R (<5)  and  one  not,  (8.3)  reduces  to 


w  {l(a  )  -  j(a  )}  which  is  positive.  Hence  an  interaction 
s  s+ 1  s+ 1 

involving  the  k  relevant  factors  will  appear  more  probable  than  one  of 
the  same  order  involving  some  irrelevant  factors.  Of  interactions 
within  R($)  of  a  given  order  that  with  the  maximum  weight  will 

predominate.  It  is  not  possible  to  make  general  statements  about 

interactions  of  different  orders  within  R(6)  since  the  difference 
between  I^as+1'  and  I(<*t+1)  <and  between  the  J's)  depends  on  p 
whereas  between  wg  and  wt  depends  solely  on  5  and  not  p. 

The  Table  gives  an  example  with  3  factors.  The  first  3  columns  are 
the  possible  C^'s  making  up  the  8  possible  €•  The  next  column  gives 

n  for  a  5  in  which  the  response  is  +  if  £  is  +  and,  in 

addition,  either  or  or  both,  are  +:  so  k  -  m  -  3.  The 

remaining  columns  give  n  for  the  8  odd  interactions  including  the 
zero-factor  interaction  that  always  has  a  negative  response.  Under 
these  last  eight  columns  are  listed  the  N's,  the  numbers  of  agreements 
between  $  and  the  interaction  of  that  column.  In  3  cases 
N  <  2*1-1  ■  4  so  the  number  2*'-1  -  N  is  listed  below  for  the 
complementary  even  interaction.  The  resulting  N's  all  exceed  4  and 
the  w's  all  exceed  0.  (This  procedure  of  switching  to  complementary 
interactions  is  quite  general.)  The  next  row  lists  the  w's  and  the 
final  row  provides  the  values  of  E(S  :  6  )  for  each  interaction  6^. 
The  generalized  Kullback-Leibler  numbers  are  differences  of  E's  and 
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'  ft'  .-'?**■**,. 


are  not  listed 


,  '•>  »-x 


Consider  the  main  effect  of  in  comparison  with  that  of 

(or  Sj).  The  difference  of  the  E's  is  -.59-  (-.84)  *  +.25  and  so 

£  will  appear  more  probable  than  the  other  two,  and  will  dominate  all 

other  interactions  since  it  has  the  largest  E.  The  next  most  important 

interaction  is  the  3-factor  one  which  will  slightly  dominate  the  3  2- 

factor  ones,  and  all  will  dominate  the  ?2  and  5^  main  effects. 

Another  interesting  response  (not  tabulated)  is  that  in  which  h 

is  +  only  when  all  of  ^  are  +.  In  medical  language  the 

disease  is  only  present  if  k  (true)  symptoms  are  present.  Such  a  6 

differs  from  a  zero-factor  interaction  in  only  one  place  and  hence  has 
k—  1 

2  +1  agreements  with  any  other  interaction  (allowing  for 

— { k-1 } 

complements)  giving  ws  >»  2  '  for  all  interactions  involving  some 

or  all  of  the  It  is  not  difficult  to  show  that 

wl(a)+  (1-  w)J<a  )  increases  in  s  if  w  <  ^  so  that  (8.3) 
shows  that  the  k-factor  interaction  of  will  dominate  all 

other  interactions  in  R{6)  and,  by  the  general  result,  will  dominate 
all  k-factor  interactions. 


9.  Conclusions 

The  main  lesson  to  be  learnt  from  this  analysis  is  that  there  are 
situations  in  which  the  increasing  complexity  that  inevitably  arises 
from  an  increase  in  the  number  of  factors  gives  rise  to  a  need  for  more 
observations  to  unravel  the  complexity;  and  that  this  is  in  contrast  to 
other  familiar  statistical  situations  in  which  the  observational 
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explosion  does  not  take  place.  The  practical  consequence  of  this  is 


that  statisticians  should  be  wary  of  data  analyses  involving  many 
factors  because,  if  a  model  like  that  studied  here  is  resonable,  there 
just  may  not  be  enough  observations  to  permit  a  satisfactory  analysis, 
so  that  any  data  analysis  must  be  a  waste  of  time.  Statisticians  are 
often  appealled  to  "to  make  sense  out  of  this  data":  they  should  resist 
the  temptation  to  do  so  without  first  checking  that  the  extraction  is 
possible.  Rather  their  talents  should  be  directed  toward  sensible 
designs  and  scientists  encouraged  to  do  planned  experimentation  rather 
than  idle  data  collection. 

The  analysis  has  also  shown  the  usefulness  of  the  Kullback-Leibler 
distance  as  a  way  of  separating  the  possibilities  and  providing 
discrimination  between  them.  A  useful  way  of  understanding  what  is 
happening  with  this  model  is  to  think  of  the  "space"  of  possible  6's 
packed  so  tight  that  the  "distances"  between  6's  become  small.  In 
contrast,  the  m-factor  situation  mentioned  in  the  introduction  has  5's 
much  more  loosely  packed  so  that  discrimination  is  easier. 
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Table  for  three  factors  (m  *  3) 


l-factor 


2-factor 


V2  S<3  Vi  V2S 


--+---  +  -  +  +  + 

-  +  -  +  +  -  + 
+  +  -  +  + 

+  +  -  +  +  +  -  -  +  + 

+  -♦  +  +  -+  +  + 

-  +  +--  +  +  +  -  + 


Ns 

7 

5  5 

3 

5 

3 

3 

(5) 

(5) 

(5) 

w; 

3/4 

1/4J74 

1/4 

1/4 

1/4 

1/4 

E: 

-.59 

-.84 

-.77 

-.74 

N  is  the  number  of  agreements  between  <5  and  the  interaction  of  that 
column:  when  this  is  less  than  2m_1 (here  4),  N  for  the  complementary 
interaction  is  given  in  brackets,  w  *  -  1  (equation  (8.2)) 

with  N  >  2m  1 .  E  *  E(6,  5^)  for  interaction  (equation  (3.2)). 
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